GRAD-PROJECT


INDIAN PREMIER LEAGUE TEAM ANALYSIS


Shreejaa Talla
stalla@student.gsu.edu
Department of computer science
Georgia State University


Index

  1. Introduction
  2. Install and Import required libraries
  3. Data
  4. Data Cleaning
    1. All matches summary
    2. All batting summary
    3. All bowling summary
  5. Describing new designs
    1. Custom scatter plot 2.hierarchy networkx graph
  6. Data visualizations
  7. Dashboard of IPL team analysis
    1. Basic method
    2. Using HTML elements
    3. Running on localhost
    4. Save as .html file
  8. Tools used
  9. Challenges
  10. References

Introduction:

Compilation of this project

  1. Please run each and every cell step by step.
  2. The end result is displayed in the dashboard type1 and dashboard type2, once the cell is run and the dashboard is generated inline, the interactive component of those dashboards can be used to test the interactivity. The dash might take 1-2 mins to load the data based on dropdown values.
  3. .HTML and .pdf results are given in the folder obtained by dashboard type3 which are not interactive as they dont have a conection to server.

Install and Import required libraries

Install the following libraries in conda command prompt on windows:

  1. conda install -c conda-forge panel
  2. conda install -c conda-forge pydot
  3. conda install -c conda-forge networkx

Data

Matches Data

This data consists of 45 columns and 824 rows, it describes about each indian premiere league matches, year in which it is conducted, test score of two teams, venue of the match, match description, toss decision, innings, umpire details and winning team details.

Column details of matches data

Batting summary data

All innings of matches are present in this data with respect to batting information such as runs, running over, strike rate, number fours, number of sixes, captain, batsmen name, venue and commentary.

Column details of batting data

Bowling summary data

Each over and ball by ball data is captured in this table, it consist of bowling team information such as economy rate, differnt types of conceds, maidens, no balls and dots. This data also provides the information of each player through all the matches.

Column details of bowling data

Data Cleaning

In data cleaning process, null and NAN values are replaced with most appropriate values based on the data. the first process, the data can be handle by replacing all null values for integer or float values to 0 or 0.0 and replacing string value to "unknown". The second process is to replace those null or NAN by doing forward and backward fill. Here, After understand the data, replacing this data using forward and backward fill was quite effective as most of this data was similar. This data also contains a few outlier values such as "-" which were replace by 0.

Matches Data

Finding out number of nulls present in each columns

Batting Data

Finding the number of null values present in each column.

Deleting the unnecessary columns and replacing the outlier data with 0.0 and replacing all other NAN and null values by performing forward and backward fill methods.

Bowling data

Finding the number of null values present in each column.

Creating custom designs

In this project, Two custom plots are introduced

  1. Custom scatter plot
  2. hierarcy based networkx chart

Custom scatter plot design

Hierarchy based networkx graph design

The networkx basic graph is displayed based on hierarchy_pos(), where the root node is sent as parameter with graph and the graph is re-arranged based on the next parent value. It is a recursion based function where parent node is detected for every run and it is sent as input in the next iteration and so on constructing the hierarchy of the graph[1].

Data Visualization

The color codes are based on the batting(orange color) and bowling(purple color) team, these color are decided based on the famous titles "orange cap" for power best batsmen and "purple cap" for best bowler.

These are the graphs for bowling teams

The graphs for batting teams

  1. This graph visualizes the power hitters of the team, that is, the batsmen with maximum number of runs through all the matches played for a particular team.

Dashboard

All the graphs are combined together into a funtion which created a figure and these graphs are fitted into the subplot of that figure and the values of each figure keeps changing dynamically based in the dropdown values declared in the next cell. Here, the data is read from excel and all the cleaning processes is followed.

  1. There are about two columns and 8 rows declared for the figure declared below in the function create_main() which maintains 14 axes.
  2. Here, only axes-level plots are used as the dashboard arrangement restricts the number of figures and hence only one figure can be display that is why faced grid plots are avoided.
  3. This dashboard can take up to 1-2 mins to load the data when a drop down value is change as it needs to analyse the data and build a graph.
  4. This dashboard can be displayed in four forms
    1. Sample inline dashboard
    2. Sample HTML inline dashboard
    3. HTML dashboard in a localhost
    4. Save this dashboard to a html file

Dashboard Type1 (Inline)

The panel library is used to build a dashboard and dictionary of dropdown values are declared which much be named same as the arguments in the function create_main(). The interaction function is call with function and the dropdown values as parameters[4].

Dashboard Type2(Include HTML)

This dashboard include different types of HTML elements, here, select and header are used. Heading1 and Heading describes about the each graph and a function create_dashI is implemented which depends on the above 2 dropdown values. And a bootstrap elements are used to arrange these graphs and dropdown in a page. The results is shown below.

Dashboard Type3(Localhost)

The same dashboard element is called by a function show which assigns a port to the app and display it on website, This approach might launch a plain application some times, that is why above two inline files are displayed and the output for this file is present in the folder.

Dashboard Type4(Save to html)

HTML interaction dont work, .html is for reference and not to test the

Tools used

  1. Matplotlib (To build subplots and custom plots)
  2. Seaborn (To design all the graphs)
  3. Pandas (To clean and process the data)
  4. Panel (Panel and its widgets are used to build differn=ent types of interactive dashboards for matplotlib)

Challenges

  1. The first challenge was to create a custom scatter plot, and with interactivity and resizing the images for a change in drop down values.
  2. To Build the dashboard based on seaborn and matplotlib, which has only one source to create and was pretty tough[3].
  3. Only axes graphs were allowed with in the dashboard as it display atmost one figure at a time. Hence, all the graphs are combined into a figure and displayed due to which all faced grid plots are ignored.

References:

  1. https://stackoverflow.com/questions/29586520/can-one-get-hierarchical-graphs-from-networkx-with-python-3
  2. https://stackoverflow.com/questions/22566284/matplotlib-how-to-plot-images-instead-of-points
  3. https://panel.holoviz.org/reference/widgets/
  4. https://coderzcolumn.com/tutorials/data-science/how-to-create-dashboard-using-python-matplotlib-panel